Methods and systems for developing data flow programs

ABSTRACT

Methods, systems, and articles of manufacture consistent with the present invention provide a development tool that enables computer programmers to design and develop a data flow program for execution in a multiprocessor computer system. The tool allows the programmer to define a region divided into multiple blocks, wherein each block is associated with data operated on by code segments of the data flow program. The development tool also maintains dependencies among the blocks, each dependency indicating a relationship between two blocks that indicates that the portion of the program associated with a first block of the relationship needs the resultant data provided by the portions of the program associated with a second block of the relationship. The development tool supports several visualization steps, including displaying a directed acyclic graph representing the nodes and the dependencies.

CROSS-REFERENCE TO RELATED APPLICATIONS

[0001] This application is related to the following pending patentapplications, and is a Continuation-in-Part of Ser. No. 09/244,138:

[0002] U.S. patent application Ser. No. 09/244,137, entitled “Method,Apparatus, and Article of Manufacture for Developing and Executing DataFlow Programs,” attorney docket no. 06502-0222-00000, and filed on Feb.4, 2001.

[0003] U.S. patent application Ser. No. 09/244,138 entitled “MethodApparatus, and Article of Manufacture for Developing and Executing DataFlow Programs, and Optimizing User Input Specifications”, attorneydocket no. 06502-0223-00000, filed Feb. 4, 2001.

[0004] The entirety of each application is incorporated herein byreference.

FIELD OF THE INVENTION

[0005] This invention relates to the field of multiprocessor computersystems and, more particularly, to data driven processing of computerprograms using a multiprocessor computer system.

BACKGROUND OF THE INVENTION

[0006] Multiprocessor computer systems include two or more processorsthat execute the instructions of a computer program. One processorexecutes a particular set of instructions while other processors executedifferent sets of instructions.

[0007] Fast computer systems, like multiprocessor computer systems, havestimulated the rapid growth of a new way of performing scientificresearch. The broad classical branches of theoretical science andexperimental science have been joined by computational science.Computational scientists simulate on supercomputers phenomena toocomplex to be reliably predicted by theory and too dangerous orexpensive to be reproduced in a laboratory. Successes in computationalscience have caused demand for supercomputing resources to rise sharplyin recent years.

[0008] During this time, multiprocessor computer systems, also referredto as “parallel computers,” have evolved from experimental designs inlaboratories to become the everyday tools of computational scientistswho need the most advanced computing resources to solve their problems.Several factors have stimulated this evolution. It is not only that thespeed of light and the effectiveness of heat dissipation impose physicallimits on the speed of a single processor. It is also that the cost ofadvanced single-processor computers increases more rapidly than theirpower. And price/performance ratios become more favorable if therequired computational power can be found from existing resourcesinstead of purchased. This factor has caused many sites to use existingworkstation networks, originally purchased to do modest computationalchores, as “SCAN”s (SuperComputers At Night) by utilizing theworkstation network as a parallel computer. This scheme has proven sosuccessful, and the cost effectiveness of individual workstations hasincreased so rapidly, that networks of workstations have been purchasedto be dedicated to parallel jobs that used to run on more expensivesupercomputers. Thus, considerations of both peak performance andprice/performance are pushing large-scale computing in the direction ofparallelism. Despite these advances, parallel computing has not yetachieved widespread adoption.

[0009] The biggest obstacle to the adoption of parallel computing andits benefits in economy and power is the problem of inadequate software.The programmer of a program implementing a parallel algorithm for animportant computational science problem may find the current softwareenvironment to be more of an obstruction than smoothing the path to useof the very capable, cost-effective hardware available. This is becausecomputer programmers generally follow a “control flow” model whendeveloping programs, including programs for execution by multiprocessorcomputer systems. According to this model, the computer executes aprogram's instructions sequentially (i.e., in series from the firstinstruction to the last instruction) as controlled by a program counter.Although this approach tends to simplify the program developmentprocess, it is inherently slow.

[0010] For example, when the program counter reaches a particularinstruction in a program that requires the result of another instructionor set of instructions, the particular instruction is said to be“dependent” on the result and the processor cannot execute thatinstruction until the result is available. Moreover, executing programsdeveloped under the control flow model on multiprocessing computersystems results in a significant waste of resources because of thesedependencies. For example, a first processor executing one set ofinstructions in the control flow program may have to wait for some timeuntil a second processor completes execution of another set ofinstructions, the result of which is required by the first processor toperform its set of instructions. Wait-time translates into anunacceptable waste of computing resources in that at least one of theprocessors is idle the whole time while the program is running.

[0011] To better exploit parallelism in a program some scientists havesuggested use of a “data flow” model in place of the control flow model.The basic concept of the data flow model is to enable the execution ofan instruction whenever its required operands become available, andthus, no program counters are needed in data-driven computations.Instruction initiation depends on data availability, independent of thephysical location of an instruction in the program. In other words,instructions in a program are not ordered. The execution simply followsthe data dependency constraints.

[0012] Programs for data-driven computations can be represented by dataflow graphs. An example data flow graph is illustrated in FIG. 1 for thecalculation of the following expression:

z=(x+y)*2

[0013] When, for example, x is 5 and y is 3, the result z is 16. Asshown graphically in the figure, z is dependent on the result of the sumof x and y. The data flow graph is a directed acyclic graph (“DAG”)whose nodes correspond to operators and arcs are pointers for forwardingdata. The graph demonstrates sequencing constraints (i.e., constraintswith data dependencies) among instructions.

[0014] For example, in a conventional computer, program analysis isoften done (i) when a program is compiled to yield better resourceutilization and code optimization, and (ii) at run time to revealconcurrent arithmetic logic activities for higher system throughput. Forinstance, consider the following sequence of instructions:

[0015] 1. P=X+Y

[0016] 2. Q=P/Y

[0017] 3. R=X*P

[0018] 4. S=R−Q

[0019] 5. T=R*P

[0020] 6. U=S/T

[0021] The following five computational sequences of these instructionsare permissible to guarantee the integrity of the result when executingthe instructions on a serial computing system (e.g., a uniprocessorsystem):

[0022] 1, 2, 3, 4, 5, 6

[0023] 1, 3, 2, 4, 5, 6

[0024] 1, 2, 3, 5, 4, 6

[0025] 1, 3, 2, 5, 4, 6

[0026] 1, 3, 5, 2, 4, 6

[0027] For example, the first instruction must be executed first, butthe second or third instruction can be executed second, because theresult of the first instruction is required for either the second orthird instruction, but neither the second nor the third requires theresult of the other. The remainder of each sequence follows the rulethat no instruction can be executed until its operands (or inputs) areavailable.

[0028] In a multiprocessor computer system with two processors, however,it is possible to perform the six operations in four steps (instead ofsix) with the first processor computing step 1, followed by bothprocessors simultaneously computing steps 2 and 3, followed by bothprocessors simultaneously steps 4 and 5, and finally either processorcomputing step 6. This is an obvious improvement over the uniprocessorapproach because execution time is reduced.

[0029] Using data flow as a method of parallelization will thus extractthe maximum amount of parallelism from a system. Most source code,however, is in a control form, which is difficult and clumsy toparallelize efficiently for all types of problems.

[0030] It is therefore desirable to provide a facility for programmersto more easily develop, visualize, debug, and optimize data flowprograms and to convert existing control flow programs into data flowprograms for execution on multiprocessor computer systems.

SUMMARY OF THE INVENTION

[0031] Methods, systems, and articles of manufacture consistent with thepresent invention facilitate development (e.g., visualization, debuggingand optimization) of new programs according to the data flow model.According to one aspect of the present invention, such methods, systems,and articles of manufacture, as embodied and broadly described herein,include a development tool that implements a block dependency approachthat allows an operator to define a memory region and divide the memoryregion into multiple blocks. Each block is associated with data (e.g., amatrix) needed by a function or other program operation, as well as codethat operates on that data. It is noted that a “block” refers to one ormore data elements in memory and does not imply a particular shape(e.g., square or rectangular) for the data elements or their placementin the memory. In other words, a block refers to a portion of data inmemory, but does not necessarily indicate the structure or arrangementof the data in the memory. Additionally, the operator specifies anydependencies among the blocks, for example, a subsequent block may bespecified dependent on an initial block. Such a dependency indicatesthat, before executing, the code associated with the subsequent blockneeds the code associated with the initial block to execute on the dataassociated with the initial block. As will be explained in detail below,the development tool facilitates development (including visualization,debugging, and optimization) of data flow programs using the blockdependency approach outlined above.

[0032] Methods, systems, and articles of manufacture consistent with thepresent invention overcome the shortcomings of the related art, forexample, by providing a data flow program development tool. Thedevelopment tool allows a programmer to visually identify datadependencies between code segments, observe the execution of a data flowprogram under development, insert breakpoints, and modify data blockcode and data assignments and dependencies. Thus, a programmer may moreeasily develop a new data flow program or convert a control flow programto the data flow paradigm.

[0033] In accordance with methods consistent with the present invention,a method is provided for developing data flow programs. The methodincludes dividing a memory area into blocks, assigning data to theblocks, and assigning code segments of a program to the blocks. Themethod further includes determining dependencies between blocks anddisplaying a graph representing the dependency relationship between theblocks.

[0034] In accordance with methods consistent with the present invention,a method is provided for developing data flow programs. The methoddividing a memory area that extends over a data set into blocks, foreach block in the memory area, associating data from the data set withthe block, and for each block in the memory area, associating a codesegment to the block. The method further includes maintaining data readand write information for each code segment, determining dependenciesbetween data blocks based on the read and write information, anddisplaying a directed acyclic graph, the directed acyclic graphcomprising nodes and arcs, each node representing at least one block,and each arc representing a dependency relationship between a first nodeand a second node. As threads execute code segments, the method changesthe presentation of the nodes and arcs to indicate unexecuted nodesusing an unexecuted visualization, executing nodes using an executingvisualization, executed nodes using an executed visualization, satisfieddependency arcs using a satisfied dependency visualization, andunsatisfied dependency arc using an unsatisfied dependencyvisualization.

[0035] In accordance with systems consistent with the present invention,a data processing system is provided for developing data flow programs.The data processing system includes a memory comprising a data flowdevelopment tool comprising instructions that associate data processedby a data flow program to blocks in memory, associate code segments ofthe data flow program to blocks, determine dependencies between blocksthat give rise to an execution order for the blocks, and display a graphof nodes and arcs depicting dependency relationships between the blocks.The data processing system further includes a processing unit that runsthe data flow development tool.

[0036] In accordance with articles of manufacture consistent with thepresent invention, a computer readable medium is provided. The computerreadable medium contains instructions that cause a data processingsystem to perform a method for developing data flow programs. The methodincludes dividing a memory area into blocks, assigning data to theblocks, and assigning code segments of a program to the blocks. Themethod further includes determining dependencies between blocks anddisplaying a graph representing the dependency relationship between theblocks.

[0037] In accordance with articles of manufacture consistent with thepresent invention, a computer readable medium is provided that isencoded with a data structure accessed by a data flow development toolrun by a processor in a data processing system. The data structureincludes nodes assigned to data processed by a data flow program and tocode segments of the data flow program and dependencies between nodes.

[0038] Other apparatus, methods, features and advantages of the presentinvention will be or will become apparent to one with skill in the artupon examination of the following figures and detailed description. Itis intended that all such additional systems, methods, features andadvantages be included within this description, be within the scope ofthe present invention, and be protected by the accompanying drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

[0039]FIG. 1 depicts an example data flow graph for the calculation ofan expression.

[0040]FIG. 2 depicts a block diagram illustrating an example of a memoryregion.

[0041]FIGS. 3A and 3B depict block diagrams illustrating an example ofdependency relationships among the blocks of the memory regionillustrated in FIG. 2.

[0042]FIG. 4 depicts an example of a directed acyclic graph illustratingthe dependency relationships shown in FIGS. 3A and 3B.

[0043]FIG. 5 depicts a block diagram of an exemplary data processingsystem suitable for use with methods and systems consistent with thepresent invention.

[0044]FIG. 6 depicts a flow chart of the steps performed by a data flowprogram development tool.

[0045]FIG. 7 depicts an example of a queue reflecting an order ofexecution of memory region blocks by a data flow program.

[0046]FIG. 8 depicts a block diagram of an exemplary multiprocessorcomputer system suitable for use with methods and systems consistentwith the present invention.

[0047]FIG. 9 depicts a flow chart of the steps performed duringexecution of a data flow program.

[0048]FIGS. 10A, 10B, and 10C depict block an execution cycle of a dataflow program.

[0049]FIG. 11 is an exemplary memory region containing a block with anarray of elements.

[0050]FIGS. 12A, 12B, 12C, and 12D illustrate the creation ofdependencies between blocks.

[0051] FIGS. 13-15 each shows three exemplary memory regions havingblocks assigned to distribution groups.

[0052]FIG. 16 illustrates a movement technique for assigning blocks tonodes.

[0053]FIG. 17 depicts an example of a directed acyclic graphillustrating the dependency relationships shown in FIGS. 3A and 3B.

[0054]FIG. 18 depicts a flow chart of the steps performed by the dataflow program development tool for graphically presenting execution of adata flow program.

[0055] FIGS. 19-25 depict the directed acyclic graph presented in FIG.17 during the processing of the blocks in the directed acyclic graph.

[0056]FIG. 26 depicts a flow diagram of the steps performed by the dataflow program development tool when determining dependencies for aselected node.

[0057]FIG. 27 depicts a flow diagram of the steps performed by the dataflow program development tool when highlighting data affected by codesegments.

[0058]FIG. 28 depicts a flow diagram of the steps performed by the dataflow program development tool when displaying the nodes executed byselected threads.

[0059]FIG. 29 depicts a flow diagram of the steps performed by the dataflow program development tool when stepping to a selected node.

[0060]FIG. 30 depicts a flow diagram of the steps performed by the dataflow program development tool when single stepping data flow programexecution.

[0061]FIG. 31 illustrates a flow diagram of the steps performed by thedata flow program development tool when saving and replaying data flowprogram execution.

[0062]FIG. 32 illustrates a flow diagram of the steps performed by thedata flow program development tool when adding or deleting dependenciesfrom a DAG.

[0063]FIG. 33 illustrates a flow diagram of the steps performed by thedata flow program development tool when setting and testing forbreakpoints.

[0064]FIG. 34 illustrates a DAG with a breakpoint.

[0065]FIG. 35 illustrates a DAG after execution stopped by a breakpoint.

DETAILED DESCRIPTION OF THE INVENTION

[0066] Reference will now be made in detail to an implementationconsistent with the present invention as illustrated in the accompanyingdrawings. Wherever possible, the same reference numbers will be usedthroughout the drawings and the following description to refer to thesame or like parts. Certain aspects of the present invention aresummarized below before turning to Figures.

[0067] Methods, systems, and articles of manufacture consistent with thepresent invention enable programmers to develop new data flow programsand to convert existing control flow programs to the data flow paradigm.To that end, the methods, systems, and articles of manufacture mayimplement a data flow program development tool.

[0068] Data flow programs developed in accordance with the principles ofthe present invention may be executed on a multiprocessor computersystem or a distributed computer system using the data flow model. Thedevelopment tool may execute on the same or different data processingsystem from that used for executing the data flow program underdevelopment.

[0069] Generally, the development tool facilitates dividing a memoryregion into blocks. Each block is associated with certain data and code,with dependencies specified between blocks. As will be explained in moredetail below, blocks that do not depend on one another can be executedin parallel, while blocks that do depend on one another await thecompletion of code execution and data manipulation of the block on whichthey depend.

[0070] Dependencies are reflected as conceptual links between dependentblocks and the precursor blocks from which they depend. A dependentblock is dependent on a precursor block when the dependent block needsthe result of the precursor block in order for the dependent block toexecute successfully. As will be shown below, dependency relationshipsmay be viewed graphically using a directed acyclic graph (“DAG”). Nodesin the graph correspond to blocks of the memory region, and thus theprogram code and data assigned to the blocks.

[0071] During execution, the code associated with the blocks is queuedfor processing in a multiprocessor data processing system, for example,by placing block pointers in a queue. Each processor may further executemultiple threads that can individually process blocks. In oneimplementation, the blocks are queued according to the dependencyinformation associated with each block. Additional information may alsoaffect the ordering of blocks in the queue, including priorityinformation, and the like.

[0072] The programmer may designate the number of threads available toprocess the blocks. For example, the programmer may designate twothreads per processor. Each thread may, for example, maintain a programcounter and temporary memory, as needed, to perform the code associatedwith the blocks.

[0073] Each thread, in turn, selects a block from the queue and executesthe program code designated by the programmer for that block. As long asthere are blocks in the queue, the threads, when available, selectblocks and execute the associated program code. Threads select queuedblocks for execution in a manner that reflects block dependencyinformation. To that end, when an available thread selects a queuedblock for execution, the thread first examines the dependencyinformation for that block. When the block or blocks from which theselected block depends have completed execution, then the thread canproceed to execute the program code for the selected block. Otherwise,the thread may enter a wait state until it can begin executing theprogram code for the selected block.

[0074] Alternatively, the thread may select the next available block inthe queue, based on any priority if appropriate, and examine that blockto determine its status with respect to any blocks upon which itdepends. Processing continues until the threads have completed executingthe program code associated with all blocks in the queue. Note thatwhile the multiprocessor data processing system may exist as a singlephysical unit, that the threads may be distributed over multipleprocessors across multiple data processing systems, for example, acrossa LAN or WAN network.

[0075] The description below provides a detailed explanation of themethods, systems, and articles of manufacture consistent with thepresent invention.

[0076] At the beginning of the design and development process, aprogrammer specifies a memory region and divides the memory region intoblocks using, for example, a graphical user interface component of thedevelopment tool. Below, the development tool will generally bedescribed in the context of developing a data flow program for matrixmanipulation. However, it is noted that the data element assigned toblocks may be scalars, structures, or any other type of data element.

[0077]FIG. 2 shows an example of a memory region 200 that containssixteen blocks arranged in a four-by-four matrix, with each blockidentified by a row number and column number. For example, the block inthe upper left corner of memory region 200 is labeled (1,1) indicatingthat it is located in the first row and the first column, and the blockin the lower right hand corner of region 200 is labeled (4,4) indicatingthat it is located in the lower right corner. Each block contains a dataset, such as a matrix or array of values or information, to be processedin accordance with certain program code. As an example, the memoryregion 200 may represent a 100×100 matrix of scalars, with each blockrepresenting a 25×25 subarray of the larger matrix. Although the memoryregion 200 and the blocks are shown are regular squares, the scalarsneed not be located contiguously in memory. Rather, the development toolpresents the memory region 200 and the blocks to the programmer as shownin FIG. 2 as a user friendly view of the data that the data flow programwill work with.

[0078] After defining the memory region and dividing it into blocks, theprogrammer specifies a state for each block. The state of a blockgenerally corresponds to the program code that the programmer assigns tothat block. In other words, the assigned code is a portion of a programthat the programmer intends to operate on the data in the block. Theinterface provides the programmer with a window or other input facilityto provide the program code for a block and internally tracks theassignment of code to the blocks.

[0079] In the example region 200, the group of blocks 202 labeled (1,1),(2,1), (3,1), and (4,1) share a first state, the group of blocks 204labeled (1,2), (1,3), and (1,4) share a second state, and the group ofblocks 206 labeled (2,2), (2,3), (2,4), (3,2), (3,3), (3,4), (4,2),(4,3), and (4,4) share a third state. Although the region 200 and theblocks 202-206 are shown as being uniform in size, in practice a memoryregion and blocks may have different shapes and sizes, hold differenttypes of data, and be distributed in memory contiguously ornon-contiguously.

[0080] Next, the programmer specifies dependency relationships betweenthe blocks. A dependency relationship exists when the code associatedwith a first block is dependent upon the result or final state of thedata assigned to a second block. Thus, the code assigned to the firstblock needs to wait for execution of the code assigned to the secondblock. FIGS. 3A and 3B illustrate three examples of dependencyrelationships between blocks in the memory region 200 of FIG. 2. Asshown in FIG. 3A, each of the blocks labeled (1,2), (1,3), and (1,4) aredependent on the blocks labeled (1,1), (2,1), (3,1), and (4,1). Thus,the blocks labeled (1,1), (2,1), (3,1), and (4,1) provide results neededby the blocks (1,2), (1,3), and (1,4).

[0081] Similarly, FIG. 3B illustrates dependencies among each of theblocks labeled (1,2), (1,3), and (1,4) and the blocks labeled (2,2),(2,3), (2,4), (3,2), (3,3), (3,4), (4,2), (4,3), and (4,4). As shown,the block labeled (1,2) is assigned data needed by the blocks in thesame column labeled (2,2), (3,2), and (4,2); the block labeled (1,3) isassigned data needed the blocks in the same column labeled (2,3), (3,3),and (4,3); and the block labeled (1,4) is assigned data needed by theblocks in the same column labeled (2,4), (3,4), and (4,4). FIGS. 3A and3B illustrate examples of dependencies for the memory region 200; aprogrammer may, of course, specify many other dependencies as necessaryto reflect the data processing structure of a data flow program underdevelopment.

[0082] Note also that the development tool may also provide a dependencyanalysis component. The dependency analysis component examines programcode to identify code that reads or writes specific data. Thus, thedependency analysis component may automatically insert dependenciesbetween blocks when the programmer specifies the code to be assigned toeach block. To that end, the development tool may build a separate steptree.

[0083] The step tree is a data structure that represents programexecution as a series of steps. The programmer adds steps to the tree,and specifies to the development tool which data objects that particularstep reads or writes. For example, the programmer may use data read anddata write identifiers (e.g., pointers or handles) to specify the data.The programmer further specifies a code section executed at that step.As steps are added, the step tree grows and maintains the order of thesteps, and thus the order and dependencies for data objects needed bythe code sections associated with the steps. The development tool maythen parse the step tree to automatically extract block dependencies.

[0084] The development tool constructs a DAG using the dependencyinformation. FIG. 4 presents an example of a DAG 400 illustrating thedependency relationships shown in FIGS. 3a and 3 b. The DAG 400illustrates graphically that the processed data associated with all ofthe blocks sharing the first state is needed by the code associated withthe blocks sharing the second state. In turn, the processed dataassociated with the blocks sharing the second state is needed byparticular blocks that share the third state. The development tool mayuse the DAG 400 to order the blocks for processing as explained below.

[0085]FIG. 5 depicts an exemplary data processing system 500 suitablefor practicing methods and implementing systems consistent with thepresent invention. The data processing system 500 includes a computersystem 510 connected to a network 570, such as a Local Area Network,Wide Area Network, or the Internet.

[0086] The computer system 510 includes a main memory 520, a secondarystorage device 530, a central processing unit (CPU) 540, an input device550, and a video display 560. The main memory 520 contains a data flowprogram development tool 522 and a data flow program 524. The memoryalso holds a data flow DAG 526 and a step tree 528. The data flowprogram development tool 522 provides the interface for designing anddeveloping data flow programs, including programs that utilize controlflow program code. Using display 560, the development tool 522 enablesprogrammers to design memory regions, such as region 200 of FIG. 2, anddivide the regions into blocks with corresponding states. The toolfurther enables programmers to write program code to operate on each ofthe blocks using a multiprocessor computer system (see FIG. 7).

[0087] The data flow program 524 represents a program designed inaccordance with the data flow paradigm developed by the data flow tool522. The data flow program 524 includes, for example, informationspecifying a memory region, the blocks of the region, the program codeassociated with each block, and dependency relationships between theblocks.

[0088] Although aspects of one implementation are depicted as beingstored in memory 520, one skilled in the art will appreciate that all orpart of systems and methods consistent with the present invention may bestored on or read from other computer-readable media, such as secondarystorage devices, like hard disks, floppy disks, and CD-ROM; a carrierwave received from a network such as the Internet; or other forms of ROMor RAM. Finally, although specific components of data processing system500 have been described, one skilled in the art will appreciate that adata processing system suitable for use with methods and systemsconsistent with the present invention may contain additional ordifferent components.

[0089]FIG. 6 is a flow chart of the process 600 performed by thedevelopment tool 522 interacting with programmers to construct data flowprograms. After a programmer initiates execution of the development tool522, the development tool 522 displays one or more windows that theprogrammer uses to construct a data flow program. First, the developmenttool 522 displays a window in which the programmer defines a memoryregion (step 610). The programmer uses the development tool 522 todivide the region into blocks (step 620).

[0090] As long as there are blocks in a region to be processed (step630), the programmer selects a block (step 640), identifies any otherblock(s) that influence the selected block's final state (in otherwords, block(s) upon which the selected block is dependent) (step 650),and specifies the program code for each block, for example, a portion ofan existing control flow program (step 660). In this manner, an existingcontrol flow program may be converted to a data flow paradigm. Note,however, that the programmer may instead write new code for each blockas part of the process of constructing a new data flow program.

[0091] After all of the blocks have been processed (steps 640 to 660),the programmer establishes the dependency relationships among the blocksby graphically linking them together (step 670). Alternatively oradditionally, as explained above, the programmer may add steps to thestep tree, and instruct the development tool 522 to automaticallyextract dependencies. In other words, with the steps described above,the development tool 522 first assists the programmer in defining aproblem to be solved. Subsequently, the development tool 522 producessource files that can be compiled and run (step 675). The source filesinclude code that (at run-time) produces in memory a DAG with the nodesand dependencies defined according to the steps set forth above. Duringrun-time, the nodes are placed on a queue (step 680). The nodes thusform the basis for parallel execution.

[0092] The development tool 522 uses the dependency/link information toqueue the blocks in a manner that reflects an acceptable order forprocessing. For example, a first block dependent upon a second block maybe placed in the queue after the second block. For the example shown inFIGS. 2-4, the blocks may be queued in the manner shown in FIG. 7 withthe blocks sharing the first state 202, (1,1), (2,1), (3,1), and (4,1),queued before the blocks with the second state 204, (1,2), (1,3), and(1,4), and followed by the blocks sharing the third state 206, (2,2),(2,3), (2,4), (3,2), (3,3), (3,4), (4,2), (4,3), and (4,4).

[0093] As noted above, the data flow program under development may beexecuted in a multiprocessor data processing system. The multiprocessordata processing system may take many forms, ranging from a singlemultiprocessor desktop computer to network distributed computer systemswith many nodes. FIG. 8 illustrates one implementation of amultiprocessor data processing system 810.

[0094] The data processing system 810 includes a network interface 820that allows a programmer to transfer the data flow program from thedevelopment tool environment (e.g., FIG. 5) for execution inmultiprocessor computer system 810. Alternatively, the development tool522 may execute on the same data processing system 810 on which the dataflow program will execute.

[0095] The data processing system 810 includes, shared memory 830 andmultiple processors 840 a, 840 b, . . . 840 n. The number and type ofprocessors may vary depending on the implementation. As one example, aSun Microsystems HPC Server with a multiple processor configuration maybe used as the data processing system. Processes execute independentlyon each of the processors in the data processing system 810. A processin this context may include threads controlling execution of programcode associated with a block of a data flow program developed using tool522.

[0096] Turning next to FIG. 9, the operation of a data flow program inaccordance with the present invention will now be described withreference to the process 900. Multiple threads are used to process thecode associated with the blocks of the data flow program. The number ofthreads may vary depending on the implementation. As examples, theprogrammer may specify one thread per processor, or the data processingsystem 810 may determine the number of threads based on the number ofavailable processors and an analysis of the data flow program.

[0097] If a thread is available to process the code associated with ablock (step 910), the thread determines whether there are any blocks inthe queue (step 920). If so, the available thread selects a block fromthe queue for processing (step 930). Typically, the blocks are selectedfrom the queue based on the order in which they were placed in thequeue. If, however, a thread determines that a selected block isdependent upon a block associated with code that has not yet beenexecuted (step 940), the thread skips the selected block (step 950).Otherwise, when the block dependencies for the selected block have beensatisfied (step 940), the thread uses an assigned processor to executethe program code associated with the selected block (step 960).Processing generally continues until the threads have executed the codeassociated with each block in the queue (step 920).

[0098] In a manner consistent with operation of the process 900, theFIGS. 10a-c illustrate a portion of the queue of FIG. 7, including thefirst five blocks of the memory region 200 queued for processing. Asshown in FIG. 10a, each thread processes a selected block using one ofthe processors. In this example, there are four threads and fourprocessors. When a thread completes processing (shown for example inFIG. 10b where a thread completes program execution of the block labeled(1,1)), the thread attempts to execute the next available block in thequeue, in this case, the block labeled (1,2). However, the thread doesnot proceed to execute because the block labeled (1,2) is dependent uponthe final state of other blocks still being executed, namely, blocks(2,1), (3,1), and (4,1).

[0099] Once execution of the program code for the above-noted blocks hascompleted, as shown in FIG. 10c, a thread can continue processing withblock (1,2). Instead of remaining idle, a thread may skip ahead toprocess other queued blocks when the dependency relationships for thosequeued blocks are met. Also, although FIG. 10 shows four threads andfour processors, more or fewer threads or processors may be useddepending upon the particular implementation.

[0100] The following description sets forth additional specificationsthe user may supply while developing a data flow program. In oneimplementation, the user may further specify the memory regions byinputting into the development tool 522 the following control flowvariables and parameters:

[0101] Name: A unique name

[0102] Kind: Determines whether the memory region is an input to theproblem, an output, input and output, or temporary space used onlyduring evaluation of the problem.

[0103] Type: Corresponds to the data type of the elements of the memoryregion, for example, integer, real, and the like.

[0104] Dimensions: 0 for a scalar, 1 for a vector, 2 for a matrix.Higher dimensions may also be used.

[0105] Size: A size for each dimension of the memory region.

[0106] Grid: A size for each dimension of the blocks in a memory region.

[0107] Leading dimension: The size of the first dimension of matrices(when a memory region is larger than the matrix it holds).

[0108] In some applications under development, it may be useful for theprogram code that performs steps on the blocks to be able to access andmanipulate the elements of a block. For example, when program codeperforms matrix manipulation operations, the program code may benefitfrom information concerning the matrices or sub-matrices stored in oneor more blocks. Macros allow the programmer to write program code thatwill perform steps on the blocks at each node in the DAG. The macrosaccess specific elements and attributes of a block in a memory region.Taking a block in a memory region as an argument, the macro may returnfor instance, the number of rows or columns in the block, or the numberof rows or columns in the memory region. The following table listsseveral exemplary macros that the programmer may apply in program codeand that will act on a block in a memory region: Macro Description#AROW(OBJ) evaluates to the absolute row of the first element in theblock, the true index #ACOL(OBJ) evaluates to the absolute column of thefirst element in the block #NROWS(OBJ) the number of rows in the block#NCOLS(OBJ) the number of columns in the block #ANROWS(OBJ) the numberof rows of elements in the memory region #ANCOLS(OBJ) the number ofcolumns of elements in the memory region #GROWS(OBJ) the number of rowsof elements per block #GCOLS(OBJ) the number of columns of elements perblock #RECROW Converts INDEX, and absolute index based on the(OBJ,INDEX) current level of recursion and converts it to a trueabsolute index #RECCOL Converts INDEX, and absolute index based on the(OBJ,INDEX) current level of recursion and converts it to a trueabsolute index

[0109]FIG. 11 shows an exemplary memory region 1100 with blocks havingelements arranged in a 10×10 fashion. Given this memory region 1100 witha block 1102 located as shown on the figure, the following macrosevaluate for this block 1102 as shown in the following table: MacroValue #ROW(A) 3 #COL(A) 2 #AROW(A) 21 #ACOL(A) 11 #NROWS(A) 10 #NCOLS(A)10 #ANROWS(A) 40 #ANCOLS(A) 40 #GROWS(A) 10 #GCOLS(A) 10

[0110] It should be noted that recursive program codes may be used inwhich the process repeatedly applies over a smaller region. In thiscase, the recursion stops when a base case is reached and the regionbecomes so small that there is not enough left to repeat the process.Specific program code can be associated with a recursive process thatwill only be executed for the base case. For example, assume that arecursive process is defined that moves over one block column and downone block row at each level of recursion. The following recursive macrosevaluate at each level as shown in the following table: Recursive LevelMacro Level 1 Level 2 Level 3 #RECROW(A,1) 1 11 21 #RECCOL(A,6) 6 16 26

[0111] Additionally, the programmer may designate program code assub-DAG program code. The sub-DAG designation instructs the developmenttool 522 to build a sub-DAG for the code associated with a particularnode. In other words, any node in a DAG have, underlying, another DAGspecifically directed to the code associated with that node. Thus, theprogrammer may develop parallelism across a whole application, or insidesmaller pieces of code. The programmer may view the resulting hierarchyof DAGs by inputting to the development tool 522 one or more DAGs thatthe development tool 522 should display.

[0112] As stated previously, dependencies are specified manually orautomatically between blocks and denote which blocks need to be executedbefore other blocks. The dependencies, in turn, determine theconnections between nodes in a DAG representing execution order. Often,several blocks in a memory region depend on several other blocks in thesame memory region. Although in most instances automatic specificationof dependencies (using the step tree explained above) is suitable, thedevelopment tool 522 further provides an input option that a programmermay use to quickly denote dependencies between multiple blocks.

[0113]FIG. 12A, for example, shows a programmer denoting a parent block1202 for a set of blocks 1204 (or state) using a development tool 522user interface (e.g., responsive to mouse and keyboard input). In thisimplementation, the parent block 1202 represents the starting upper leftcorner of a set of parent blocks to be designated. Then the programmerspecifies whether the dependency on the parent block 1202 is fixed orfree with respect to row and column.

[0114] FIGS. 12B-D illustrate different combinations of fixed and freedesignations given an exemplary dependent set of blocks 1204. If theprogrammer designates the dependency as fixed, all blocks in thedependent set of blocks 1204 depend on the processing of the parentblock 1202 (FIG. 12A). If the dependency is free with respect to row,the block that is depended on varies as row location in the dependentset of blocks 1204 varies (from the upper left block) (FIG. 12B).Similarly, if the dependency is free with respect to column, the blockthat is depended on varies as column location in the dependent set ofblocks 1204 varies (from the upper left block) (FIG. 12C). If thedependency is free with respect to row and column, the block that isdepended on varies as location in the dependent set of blocks varies(FIG. 12D). Through this method of designating dependencies, thedevelopment tool 522 allows a programmer to quickly manually designatemultiple block dependencies.

[0115] For the purposes of assigning blocks to nodes in a DAG, thedevelopment tool 522 may provide either or both of a “distribution”mechanism and a “movement” mechanism. With regard first to“distributions”, the development tool 522 permits the programmer toassign certain types of “distributions” to sets of blocks in a memoryregion. The distributions then control the manner in which blocks areassigned to nodes in a DAG. The distributions may be used to flexiblygroup different blocks into a single node and consequently allowdifferent parallel processing approaches to be used for execution of aproblem.

[0116] For example, given that the result of a 3×3 matrix multiplyproblem is a 3×3 matrix, the programmer may first select 9 threads tooperate on 9 nodes, one for each value in the resulting matrix. However,the programmer, as an alternate approach, may select 3 threads toprocess 3 nodes, one for each column in the resulting matrix. In thealternate approach, a node will contain more blocks but the data flowprogram will use less threads. The varying distributions give theprogrammer flexibility in testing different parallel processingtechniques.

[0117] To designate a distribution, the programmer selects a rectangulararea of the memory region to identify a set of blocks. In addition todetermining the allocation of blocks to nodes, the distributionsoptionally control on which blocks macros operate. To this end, thedevelopment tool 522 may support two main categories of distributions:primary and secondary. The difference between primary and secondarydistributions is that the development tool 522 may, if selected by theprogrammer, restrict macros to operate on blocks in primarydistributions but not on blocks in secondary distributions. The primarydistribution generally determines how many nodes there will be in theDAG for data flow program under development. For a set of blocks thatthe programmer designates as a secondary distributions, the developmenttool adds each block in the set of blocks to the same node of the DAG.

[0118] Distributions may be categorized as “primary single”, “secondarymultiple row,” “secondary multiple column,” “secondary all,” and“multiple” (either primary or secondary). Primary single distributionscontrol how many DAG nodes are created. If a primary single distributionis present in a memory region, the development tool 522 will create oneDAG node for each block in the distribution. Each block in a primarysingle distribution will enter its own node; no two blocks of a givenprimary single distribution will share the same node. The developmenttool 522 will also assign each block in additional primary singledistributions (e.g., in additional memory regions) to the nodes in theDAG as well.

[0119] For all other types of distributions, the development tool 522determines which block in the additional distribution is added to a DAGnode through a process that can be conceptualized as visually placingthe additional distribution over the primary single distribution. Theblock in the additional distribution that is in place over a primarysingle distribution block is added to the node containing that primarysingle distribution block.

[0120] Secondary distributions include secondary multiple row, secondarymultiple column, and secondary all distributions. When a block in asecondary multiple row distribution is added to a node, then all of theblocks in the row of that block are also added to the node. Similarly,for secondary multiple column distributions, the each block in thecolumn is added. In secondary all distributions, when a block in thedistribution is added to a node, every block in the distribution isadded to the node.

[0121] Multiple distributions may be primary or secondary. If theprimary single distribution is larger than the multiple distribution,then blocks from the multiple distribution are added to nodes in aprocess that may be conceptualized as iteratively placing the multipledistribution over the primary distribution and shifting until themultiple distribution has covered the whole primary distribution. Ateach iteration, a multiple distribution block that is over a primarydistribution block is entered into the same node containing the primarydistribution block.

[0122] Distributions may also have a transpose attribute. The transposeattribute indicates that the distribution is transposed before theoverlaying process is applied.

[0123]FIG. 13 shows exemplary memory regions used in a matrixmultiplication problem involving three 2-dimensional memory regions, A,B, and C. Assume that each memory region has row and column sizes suchthat the memory regions are divided into square blocks as shown in FIG.13. The operation A*B=C can be performed in parallel using severaldifferent approaches. First, consider an approach in which each block ofC is written by a different thread. A block in C is formed bymultiplying the blocks in the corresponding row of A by thecorresponding column of blocks in B. In this example, the dashed linesrepresent the distributions created by the user.

[0124] For the 3×3 case depicted in FIG. 13, since C has a primarysingle distribution, the development tool 522 establishes a node in aDAG for each of the nine blocks. In response to the secondary multiplerow distribution on A and the multiple column distribution on B, thedevelopment tool 522 adds the rows of A and columns of B to nodes asexplained above. For example, when the C(1,1) block is added to thenode, the A(1,1) and B(1,1) blocks are also added. Because the A(1,1)block is in a secondary multiple row distribution, all of the blocks inthat row are also added to the same node. Similarly, because the B(1,1)block is in a secondary multiple column distribution, all of the blocksin that column are added to the same node.

[0125] The resulting nodes that the development tool 522 creates areshown in the table below. In the table, the ordered pair specifies therow and column of each block added, and the hyphen (“-”) specifies arange of rows or columns when more than one block is added from adistribution. Node Blocks Added Node 1 C(1,1), A(1,1-3), B(1-3,1) Node 2C(1,2), A(1,1-3), B(1-3,2) Node 3 C(1,3), A(1,1-3), B(1-3,3) Node 4C(2,1), A(2,1-3), B(1-3,1) Node 5 C(2,2), A(2,1-3), B(1-3,2) Node 6C(2,3), A(2,1-3), B(1-3,3) Node 7 C(3,1), A(3,1-3), B(1-3,1) Node 8C(3,2), A(3,1-3), B(1-3,2) Node 9 C(3,3), A(3,1-3), B(1-3,3)

[0126]FIG. 14 shows primary A and B distributions created for the samematrix multiply problem. The distributions shown in FIG. 14 result inthe following 9 nodes: Node Blocks Added Node 1 C(1,1), A(1,1), B(1,1),A(1,2-3), B(2-3,1) Node 2 C(1,2), A(1,1), B(1,2), A(1,2-3), B(2-3,2)Node 3 C(1,3), A(1,1), B(1,3), A(1,2-3), B(2-3,3) Node 4 C(2,1), A(2,1),B(1,1), A(2,2-3), B(2-3,1) Node 5 C(2,2), A(2,1), B(1,2), A(2,2-3),B(2-3,2) Node 6 C(2,3), A(2,1), B(1,3), A(2,2-3), B(2-3,3) Node 7C(3,1), A(3,1), B(1,1), A(3,2-3), B(2-3,1) Node 8 C(3,2), A(3,1),B(1,2), A(3,2-3), B(2-3,2) Node 9 C(3,3), A(3,1), B(1,3), A(3,2-3),B(2-3,3)

[0127] As an example, the program code that executes on each node may berepresented by a FORTRAN function, MATRIX_MULTIPLY, that takes asarguments the location, number of rows, and number of columns of thethree matrices A, B, and C, respectively. CALL MATRIX_MULTIPLY(A(#AROW(A),1),#NROWS(A),#ANCOLS(A), $B(1,#ACOL(B)),#ANROWS(B),#NCOLS(B), $C(#AROW(C),#ACOL(C)),#NROWS(C),#NCOLS(C))

[0128]FIG. 15A shows another allocation of distributions for the matrixmultiplication problem in which the programmer has determined that eachthread will process a column of blocks in C. In this case, thedevelopment tool 522 creates three nodes because there are three blocksin the primary single distribution. As explained above, when themultiple column distributions are laid over the primary singledistribution, each block over a primary single distribution block isadded to the same node as that primary distribution block, along withthe additional block in the same column of the multiple columndistribution. In the example shown in FIG. 15, for example, the blockB(2,1) of the secondary multiple column distribution of B isconceptually positioned over C(1,1). Thus, the development tool 522 addsthe block B(2,1) to the node containing C(1,1). Furthermore, becauseblock B(2,1) is part of a multiple column distribution, the block B(2,2)in the same column as B(2,1) is also added to the node containingC(1,1). Also note that when the development tool 522 adds a block from Ato a node, all blocks from A are added to that node because all theblocks of A are designated as a secondary all distribution. Node BlocksAdded Node 1 C(1,1), B(1,1), A(1-3,1-3), C(2-3,1), B(2-3,1) Node 2C(1,2), B(1,2), A(1-3,1-3), C(2-3,2), B(2-3,2) Node 3 C(1,3), B(1,3),A(1-3,1-3), C(2-3,3), B(2-3,3)

[0129] The following program code may be used to execute themultiplication: CALL MATRIX_MULTIPLY (A(1,1),#ANROWS(A),#ANCOLS(A), $B(1,#ACOL(B)),#ANROWS(B),#NCOLS(B), $ C(1,#ACOL(C),#ANROWS(C),#NCOLS(C))

[0130]FIG. 15B shows another example where the transpose of B is to bemultiplied by A to form C. The transpose attribute explained aboveallows several of the allocations from the previous example to bereused, with modifications to the memory area B as shown in FIG. 15B.

[0131] As noted above, the development tool 522 also supports a“movement” mechanism for adding blocks in a memory area to nodes in aDAG. Turning next to FIG. 16, that figure shows three examples of themovement mechanism on a memory area M: a row movement 1602, a columnmovement 1604, and a combination movement 1606.

[0132] With regard to the row movement 1608, the programmer first draws(or specifies using another input mechanism such as a keyboard) theselection 1608 shown in FIG. 16. The development tool 522 then moves theselection 1608 across the memory area M until the leading edge of theselection 1608 hits a boundary of the memory area. At each position, thedevelopment tool 522 adds the blocks covered by the selection 1608 to anode in the DAG. Thus, for the row movement 1608, the development tool522 adds three nodes to the DAG.

[0133] Similarly, with regard to the column movement 1604, theprogrammer first draws the selection 1610 shown in FIG. 16. Thedevelopment tool 522 then moves the selection 1610 across the memoryarea M until the leading edge of the selection 1608 hits a boundary ofthe memory area. At each position, the development tool 522 adds theblocks covered by the selection 1610 to a node in the DAG. Thus, for therow movement 1608, the development tool 522 adds three nodes to the DAG.

[0134] The combination movement 1606 operates in the same fashion. Inparticular, the development tool 522 moves the selection 1612 over thememory area M until the leading edge of the selection 1612 hits aboundary in each direction of movement. Thus, the for the combinationmovement 1606, the development tool 522 creates four DAG nodes, eachassociated with four blocks.

[0135] Methods and systems consistent with the present invention alsoprovide visualization support for developing data flow programs. As willbe explained in more detail below, the development tool 522 supports thevisual representation and presentation of: code segments as one or morenodes in a DAG, attributes that signify that a code segment has alreadyexecuted, is currently executing, or has not yet begun executing,dependencies of a code segment on other code segments with an attributethat signifies whether the dependency has been met, the portions of oneor more data structures that are effected by a code segment, and nodesthat a selected thread has executed.

[0136] For example, FIG. 17 depicts a DAG 1700 illustrating thedependency relationships corresponding to FIGS. 3A and 3B. The DAG 1700illustrates graphically that the data associated with the blocks sharingthe first state 1702 are needed for processing by each of the blockssharing the second state 1704. In turn, the data associated with theblocks sharing the second state 1704 are needed by the groups of blocksthat share the third state 1706.

[0137] In this embodiment, the development tool 522 represents anunexecuted code segment as a diamond-shaped node, an executing codesegment as a square node, and an executed code segment as a circularnode. The development tool 522 also represents an unmet dependency as adashed arc and a satisfied dependency as a bolded, solid arc. Oneskilled in the art, however, will recognize that any change inrepresentation of the nodes and arcs (e.g., a change in shape, color,shading, animation, sound, and the like), may be used to represent thenodes and arcs in different states. Thus, the nodes and arcs used in themethods, systems, and articles of manufacture consistent with thepresent invention are not limited to those illustrated. Rather, thedevelopment tool 522 generally presents an unexecuted node using anunexecuted visualization, an executing node using an executingvisualization, and an executed node using an executed visualization,while representing arcs with an unsatisfied dependency visualization ora satisfied dependency visualization.

[0138]FIG. 18 depicts a flow chart of the steps performed by the dataflow program development tool 522 for visualization of the state of thecode segments on the DAG. Initially, the development tool 522 receivesan indication to run the program (step 1802). The next step performed bythe development tool 522 is to wait until a processor is available (step1804). When a processor becomes available, the development tool 522selects a block and its associated code from the queue (step 1806). Thedevelopment tool 522 then checks to determine whether all of thedependencies for the selected block are met (step 1808). If all of thedependencies for the selected block of code are met, the developmenttool 522 executes the selected block on the processor (step 1810). Ifall of the dependencies for the selected block are not met, then thedevelopment tool 522 continues to search for a block of code that doeshave all of its dependencies met. As a result, the program adapts todifferent environments (e.g., machine load, number of threads, and thelike) by executing the code segments that are ready. Thus, rather thancontinuing to wait on an originally selected code segment until it isready to execute, the development tool 522 can execute code segmentsthat become ready sooner than the originally selected code segment. Whenthe selected block is executed, the development tool 522 modifies thenode for the selected block to indicate that the code is executing (step1812). Assuming there are three threads running in parallel, three codesegments can be executed simultaneously.

[0139] Thus, as shown in FIG. 19, three of the nodes 1902, 1904 and 1906on the DAG 1900 are square nodes to indicate that the code segmentsrepresented by the nodes are executing.

[0140] The next step performed by the development tool 522 is to waituntil the execution of the block is complete (step 1814). After theexecution of the code segment is complete, the development tool 522modifies the node of the selected block to indicate that the executionis complete (step 1816). The development tool 522 also modifies theappearance of any dependency arcs out of the selected block to indicatethat the dependency has been met (step 1818). Thus, after the executionof node 1902 in DAG 1900 is complete, the development tool 522 displaysthe node 1902 as a circular node 2002 (see the DAG 2000 in FIG. 20). Inaddition, the development tool 522 displays the arcs 2010, 2012, and2014 out of node 2002 as bolded, solid arcs 2010, 2012, and 2014 toindicate that the dependencies out of the node 2002 have been met.

[0141] Next, the development tool 522 determines whether there are anymore blocks on the queue awaiting execution (step 1820). If there are nomore blocks, the processing ends. If there are more blocks available,the development tool 522 continues processing at step 1804. Returning tothe example depicted in FIG. 20, because the code segment represented bynode 2002 is no longer executing, a thread or processor becomesavailable. Thus, the development tool 522 selects the next block(represented by node 2008) from the queue. Since all dependencies forthe selected block are met, the development tool 522 executes theselected block, and represents the node 2008 as a square node toindicate that the code is executing. Meanwhile, the code segmentsrepresented by nodes 2004 and 2006 continue to execute.

[0142] After the execution of the next code segment associated with ablock assigned to node 2004, the development tool 522 represents thenode 2004 as a circular node 2104 (see FIG. 21). The development tool522 also modifies the arcs 2110, 2112, and 2114 to indicate that thedependencies from the code segment associated with a block assigned tonode 2104 have been met. As shown in FIG. 21, the code segmentsrepresented by nodes 2102 and 2104 have been executed, while the codesegments represented by nodes 2106 and 2108 are still executing. Becausea processor has become available, the tool 522 selects the next blockfrom the queue. This block is represented by node 2116.

[0143] As depicted in the DAG 2100 shown in FIG. 21, two of thedependencies for the block associated with node 2116, represented byarcs out of nodes 2106 and 2108, have not yet been met. Thus, thedevelopment tool 522 does not begin execution of the code segmentassociated with the block for node 2116 (and its shape remains adiamond). Rather, the development tool 522 continues to check the queuefor code segments that are ready to execute. However, the only codesegments ready to execute are in fact currently executing (2106 and2108). Thus, one thread is idle while one thread executes node 2106 andone thread executes node 2108. When the threads finish, the execution ofthe code segments represented by nodes 2202, 2204, 2206, and 2208 arecomplete (see DAG 2200 depicted in FIG. 22). Also, at this point, threethreads or processors are available and the development tool 522continues to check the queue for code segments ready to execute. Thus,the development tool 522 selects and executes the code segments forblocks in the queue represented by nodes 2210, 2212 and 2214.

[0144] After execution of the code segment associated with the blockrepresented by node 2210, the development tool 522 displays the node asa circular node 2310 (see the DAG 2300 shown in FIG. 23). At this point,the code segments associated with blocks represented by nodes 2302,2304, 2306, 2308, and 2310 have been executed. In addition, thedevelopment tool 522 represents the dependencies out of node 2310 assolid, bolded arcs 2318, 2320, and 2322 to indicate that thesedependencies are met. The development tool 522 then selects the nextcode segment from the queue associated with a block represented by node2316. The development tool 522 determines that all dependencies for theselected node are met, begins execution of the code associated with theselected node, and represents the selected node as a square node 2316 toindicate that the code segment is executing. Similarly, when theexecution of the code segments associated with blocks represented bynodes 2312 and 2314 is also complete, the nodes 2402, 2404, 2406, 2408,2410, 2412, and 2414, depicted in FIG. 24, indicate that the executionof these code segments is complete. At this point, all dependencies inthe DAG 2400 are met. DAG 2500 in FIG. 25 illustrates the state of allnodes and dependencies after all code segments have been executed andall dependencies have been met.

[0145] Methods and systems consistent with the present invention allow aprogrammer to view the dependencies of a code segment on other codesegments. The development tool 522 may use different representations fora dependency that has been met and a dependency that has not been yet(as explained above). The dependency view allows a programmer to quicklyascertain the impact of changes to the DAG on other nodes in the DAG.

[0146]FIG. 26 depicts a flow chart of the steps performed by the dataflow program development tool 522 to display the dependencies of aselected code segment. The neighboring DAG portion 2602 illustratesgraphically the operation of the development tool 522. Initially, thedevelopment tool 522 determines a selected block of code throughkeyboard or mouse input, as examples (step 2604). The selected block ofcode is generally associated with a block and a node in the DAG. Thus,the development tool 522 may optionally modify the appearance of theassociated node in the DAG (step 2606). As examples, the associated nodemay change in appearance from a diamond to a square, become bolded,change its line style, and the like.

[0147] The development tool 522 continues to trace arcs back through theDAG (step 2608). As development tool 522 finds new dependencies thedependencies are highlighted for the programmer. When there are no arcsleft to explore, the processing ends.

[0148] The development tool 522 may also present to the programmerportions of data that are affected by a code segment. For example, thedevelopment tool 522 may show a view of the elements of a datastructure, the elements of an array, and the like. As the data flowprogram executes, the development tool 522 highlights the data that oneor more code segments currently executing are modifying.

[0149] Turning next to FIG. 27, that figure presents a flow diagram 2700of the steps performed by the development tool 522 when presenting tothe programmer portions of data that a code segment effects. Thedevelopment tool 522 determines the selected data for monitoring (step2702). Thus, as shown in the node view 2703, the programmer hasselected, using the dashed selector box, a data element associated withthe node. In particular, the programmer has selected the matrix M.

[0150] Subsequently, the development tool 522 provides a graphicalrepresentation of the matrix M (step 2704). As shown in the node view2705, the matrix is shown with boxes representing its constituentelements M1, M2, M3, and M4. The development tool 522 monitors for readsand/or writes to the selected data as threads execute code segmentsassociated with DAG nodes (step 2706). When the development tool 522detects that the selected data has been affected by a code segment, thedevelopment tool 522 highlights or otherwise modifies the graphicalrepresentation so that the programmer can observe which parts of theselected data are changing. For example, in the node view 2709, thedevelopment tool 522 has cross-hatched elements M1 and M4 to show thatan executing code segment is reading or writing to those elements.

[0151] An additional visualization option available to the programmer isthe thread path view. When the programmer selects the thread path view,the development tool 522 provides the programmer with a display thatshows, for each thread selected by the programmer, the set of nodesexecuted by those threads. As a result, the programmer can ascertainwhich threads are under or over utilized, for example, and experimentwith modifications to the data flow program that allow the data flowprogram to perform better.

[0152] Turning to FIG. 28, that figure presents a flow diagram 2800 ofthe steps performed by the development tool 522 when presenting to theprogrammer a thread path view. The development tool 522 determines thethreads selected by the programmer (in this instance using a radiobutton selection) (step 2802). Thus, as shown in the selection box 2803,the programmer has selected, thread 2 and thread 3.

[0153] Subsequently, the development tool 522 displays the nodesexecuted by the selected threads. For example, the thread path view 2805shows that thread 2 executed nodes (1,1), (1,2), (2,2), and (2,3), andthat thread 3 executed nodes (3,3) and (3,4). Alternatively, thedevelopment tool 522 may present the thread path view by highlightingnodes on a DAG in correspondence with colors, line styles, and the likeassigned to threads.

[0154] The thread path view indicates which threads executed whichnodes. To that end, the development tool 522 may maintain executioninformation during data flow program execution that is useful forpresenting the thread path view. The execution information may include,as examples, a time stamp, thread identification, node identification,and the like.

[0155] As noted above, the development tool 522 also provides debuggingfunctions. The debugging functions respond to debugging commands thatinclude, as examples, the ability to step to a point in data space, theability to single step in data space (step debugging commands), theability to add breakpoints (breakpoint debugging commands), the abilityto save program execution information for later replay (replay debuggingcommands), and the ability to add or delete block dependencies(dependency modification debugging commands).

[0156]FIG. 29 presents a flow diagram 2900 of the steps performed by thedevelopment tool when allowing the programmer to step to a point in dataspace. The development tool 522 obtains from the programmer anindication (e.g., a mouse click on a DAG node, keyboard input, or thelike) of the next node that the programmer wants the development tool522 to process (step 2902). The development tool 522 then optionallyhighlights the selected node and determines the dependencies for theselected node (steps 2904 and 2906).

[0157] In other words, before the development tool 522 executes the codefor the selected node, the development tool 522 first satisfies thedependencies for the selected node (step 2908). Once the dependenciesfor the selected node are satisfied, the development tool 522 executesthe code for the selected node (step 2910). Processing then stops andthe programmer may review the results obtained by execution of theselected node.

[0158] Turning next to FIG. 30, that figure illustrates a flow diagram3000 of the steps performed by the development tool 522 when allowingthe programmer to single step the execution of a data flow program. Thedevelopment tool 522 pauses execution of the data flow program and waitsfor an indication from the programmer to perform a single step (steps3002 and 3004). When the development tool 522 receives the indication,the development tool 522 selects and executes code for the next node inthe queue (step 3006). Processing then stops and the programmer mayreview the results obtained by execution of the selected node.

[0159] With regard next to FIG. 31, that figure illustrates a flowdiagram 3100 of the steps performed by the development tool 522 whenallowing the programmer to save and replay program executioninformation. The development tool 522 pauses execution of the data flowprogram and outputs DAG status information to secondary storage (e.g., afile) (steps 3102 and 3104). The DAG status information provides ahistory of execution of DAG nodes which the development tool 522 may useto replay (e.g., visually on a display) execution of nodes over time. Tothat end, the development tool 522 may save, as examples, the DAGstructure, node timestamps of execution, breakpoints, threadidentifications for executed nodes, dependency status, programmerselected step points, ordering of nodes in the queue, and the like asDAG status information.

[0160] Thus, when the development tool 522 receives a replay indicationfrom the programmer, the development tool 522 loads DAG statusinformation from the secondary storage (steps 3106 and 3108). Thedevelopment tool 522 may then replay node execution (e.g., by presentinga visual representation of a DAG over time) by highlighting (ordisplaying as text output) the execution of nodes in the DAG over time(step 3110).

[0161] With regard next to FIG. 32, that figure illustrates a flowdiagram 3200 of the steps performed by the development tool 522 whenallowing the programmer to add or delete dependencies. The developmenttool 522 pauses execution of the data flow program and receives anindication of a dependency to add or delete (steps 3202 and 3204). Forexample, FIG. 32 shows the programmer using a pointer to specifydeletion of dependency 3206 (from node (1,1) to node (1,2)), whileadding a dependency 3208 (from node (1,3) to node (1,2)).

[0162] In response, the development tool 522 adds or deletes thespecified dependencies and enqueues the blocks for processing (steps3210 and 3212). Execution continues using the newly added or removeddependencies (step 3214). Thus, the programmer, when faced withincorrect execution of a data flow program under development mayinvestigate the cause of the problem, find that a dependency is missing,and add the dependency. Similarly, the programmer may find that adependency is not in fact necessary and delete the dependency toinvestigate whether performance improves.

[0163] As noted above, the development tool also supports breakpoints.In one implementation, the development tool provides 1) one point, 2)none after, 3) all before, and 4) task node breakpoints specified onnodes. A “one point” breakpoint halts execution of the data flow programwhen the specified node is selected for execution. A “none after”breakpoint halts execution when a thread selects for execution any nodein the DAG after the specified node. An “all before” breakpoint haltsexecution when all nodes before the specified node complete execution(note that some nodes after the specified node may also complete,depending on the order of node execution). A “task node” breakpointhalts execution when a thread selects a node for execution that isassociated with code that performs a designated task (e.g., a matrixmultiplication). Breakpoints may be used in combination on the samenode, for example, a “one point” breakpoint may be used with a “noneafter” breakpoint or an “all before” breakpoint, or both.

[0164] With reference next to FIG. 33, that figure illustrates a flowdiagram 3300 of the steps performed by the development tool 522 whensetting and checking breakpoints. The development tool 522 receives anode and breakpoint type indication, and in response sets the breakpointfor the node (steps 3302 and 3304). Then, during execution of the dataflow program, the development tool 522 monitors for breakpointconditions to be met (step 3306). When the development tool 522determines that the conditions for any particular breakpoint are met,the development tool 522 halts the data flow program (step 3308).

[0165] The development tool 522 may display the progress of the dataflow program, including breakpoints to the programmer. For example, asshown in FIG. 34, the DAG 3400 illustrates that the programmer hasselected node (1,3) as a “one point” breakpoint. During execution,threads first execute nodes (1,1), (2,1), (3,1), and (4,1). A threadthen selects and executes node (1,2). At this point, the specifiedbreakpoint still has not been reached. However, assuming that the nextthread selects node (1,3) for execution, the development tool 522recognizes that the “one point” breakpoint has been reached, and haltsexecution of the data flow program. FIG. 35 shows the state of the DAGwhen the breakpoint is reached (with circular nodes representingexecuted nodes).

[0166] In one embodiment, the pseudocode ‘C’ structure shown in Table 1may be used to represent a node in the DAG: TABLE 1 typedef structfinal_dag_node { long doneflag; /* clear when node has been processed */long takenflag; /* set when claimed by a thread */ long process; /*process index */ long nregions; /* number of regions */ nodeRegion*regions;  /* the regions for this node */ long numdepend; /* number ofdependency groups */ struct dependency_group *depend; /* pointers todependency group */ long recursion_level;   /* level this node is at */struct final_dag_node *parent;  /* parent if in a subdag */ structfinal_dag_node *next; /* link to next node in the queue */ long endflag; /*set for nodes with no dependents */ long level; /* depth of dag calls*/ struct final_dag_node *preferred;   /* link to the preffred next node*/ long pref_priority; /* the priority to assign to the preferred node*/ } FinalDagNode;

[0167] Note that the node structure includes the process (whichidentifies what task to do), the data regions that will be acted on, thedata dependencies which point at the nodes that are needed before thisnode can execute, and additional status fields.

[0168] An exemplary pseudocode ‘C’ structure shown in Table 2 may beused to define data assigned to blocks: TABLE 2 typedef structnode_regions { long ndims; /* number of dimensions */ longstart[MAX_DIMENSIONS]; /* starting index */ long end[MAX_DIMENSIONS]; /* ending index */ objectSize *osize; /*pointer to size object */}nodeRegion;

[0169] Dependencies may be stored in groups as illustrated as shown bythe pseudocode ‘C’ structure in Table 3. Each group may include an arrayof pointers to nodes that the node in question is dependent on. TABLE 3typedef struct dependency_group { long ndeps; /* number of dependencies*/ FinalDagNode **depend; /* pointers to nodes in dependencies */ structdependency_group *next;   /* link to next group in for the node*/ }DependencyGroup;

[0170] Methods, systems, and articles of manufacture consistent with thepresent invention enable a programmer to easily develop data flowprograms and to convert existing control flow programs according to thedata flow model. By permitting programmers to define memory regions anddivide them into blocks with corresponding states (each related toparticular control flow program instructions), the interface facilitatesthe development of a data flow program for execution in a multiprocessorenvironment.

[0171] The foregoing description of an implementation of the inventionhas been presented for purposes of illustration and description. It isnot exhaustive and does not limit the invention to the precise formdisclosed. Modifications and variations are possible in light of theabove teachings or may be acquired from practicing of the invention. Forexample, the described implementation includes software but the presentinvention may be implemented as a combination of hardware and softwareor in hardware alone. The invention may be implemented with bothobject-oriented and non-object-oriented programming systems. The claimsand their equivalents define the scope of the invention.

What is claimed is:
 1. A method in a data processing system fordeveloping a data flow program comprising code segments that operate ondata in memory, the method comprising the steps of: dividing the memoryinto blocks; assigning at least a portion of the data and at least onecode segment to each block; determining whether dependencies exist amongthe blocks such that a first block depends on data assigned to a secondblock; and displaying a graph comprising the blocks and the determineddependencies.
 2. A method according to claim 1, wherein the step ofdisplaying comprises the step of displaying a graph comprising nodesassigned to the blocks and dependency arcs representing the determineddependencies.
 3. A method according to claim 2, wherein the step ofdisplaying further comprises the step of presenting the dependency arcsusing a satisfied dependency visualization when the determineddependency is satisfied, and presenting the dependency arcs using anunsatisfied dependency visualization when the determined dependency isunsatisfied.
 4. A method according to claim 2, further comprising thesteps of: receiving a node selection specifying a selected one of thenodes; determining unmet dependencies for the selected node; anddisplaying in a visually distinctive manner the unmet dependencies inthe graph.
 5. A method according to claim 2, further comprising thesteps of: providing for execution of the code segments using threads;receiving a thread selection specifying at least one of the threads; anddisplaying nodes executed by the at least one thread.
 6. A methodaccording to claim 1, wherein the nodes include executed nodes andunexecuted nodes, and wherein the step of displaying further comprisesthe step of displaying the unexecuted nodes using an unexecutedvisualization and the executed nodes using an executed visualization. 7.A method according to claim 1, wherein the data includes a datastructure, and wherein the step of displaying further comprises the stepof: facilitating visualization of at least a portion of the datastructure accessed by at least one of the code segments by graphicallypresenting at least a portion of the data structure and accentuating theportion of the data structure accessed by the at least one code segment.8. A method in a data processing system for developing a data flowprogram comprising code segments distributed between memory blocks, themethod comprising the steps of: representing the data flow program as agraph comprising nodes and node dependencies between the nodes; anddisplaying the graph to facilitate visualization of the data flowprogram.
 9. A method according to claim 8, wherein the nodes includeexecuted nodes and unexecuted nodes, and wherein the step of displayingcomprises the step of displaying the unexecuted nodes with an unexecutedvisualization and displaying the executed nodes with an executedvisualization.
 10. A method according to claim 9, wherein the nodesinclude executing nodes, and wherein the step of displaying comprisesthe step of displaying the executing nodes with an executingvisualization.
 11. A method according to claim 8, wherein the nodedependencies include satisfied dependencies and unsatisfieddependencies, and wherein the step of displaying comprises the steps ofdisplaying the unsatisfied dependencies using an unsatisfied dependencyvisualization, and displaying the satisfied dependencies using asatisfied dependency visualization.
 12. A computer-readable mediumcontaining instructions that cause a data processing system to perform amethod for developing a data flow program comprising code segments thatoperate on data in memory, the method comprising the steps of: dividingthe memory into blocks; assigning at least a portion of the data and atleast one code segment to each block; determining a dependency impartedby a first block depending on data assigned to a second block; anddisplaying a graph comprising the blocks and the determined dependency.13. A computer-readable medium according to claim 12, wherein the stepof displaying comprises the step of displaying a graph comprising nodesassigned to the blocks and a dependency arc representing the determineddependency.
 14. A computer-readable medium according to claim 12,wherein the step of displaying further comprises the step of presentingthe dependency arc using a satisfied dependency visualization when thedetermined dependency is satisfied, and presenting the dependency arcusing an unsatisfied dependency visualization when the determineddependency is unsatisfied.
 15. A computer-readable medium according toclaim 13, further comprising the steps of: receiving a node selectionspecifying a selected node; determining unmet dependencies for theselected node; and highlighting in the graph the unmet dependencies. 16.A computer-readable medium according to claim 13, further comprising thesteps of: providing for execution of the code segments using threads;receiving a thread selection specifying at least one of the threads; anddisplaying nodes executed by the at least one thread.
 17. Acomputer-readable medium according to claim 12, wherein the nodesinclude executed nodes and unexecuted nodes, and wherein the step ofdisplaying further comprises the step of presenting the unexecuted nodesusing an unexecuted visualization and the executed nodes using anexecuted visualization.
 18. A computer-readable medium according toclaim 12, wherein the data includes a data structure, and wherein thestep of displaying further comprises the step of: facilitatingvisualization of at least a portion of the data structure accessed by atleast one of the code segments by graphically presenting at least aportion of the data structure and accentuating the portion of the datastructure accessed by the at least one code segment.
 19. A method in adata processing system for developing a data flow program comprisingcode segments that operate on data in a memory, the method comprisingthe steps of: dividing into blocks the memory that stores the data; foreach block, assigning at least a portion of the data to the block andassigning at least one of the code segments to the block; storing dataread and data write identifiers for each code segment, the data read anddata write identifiers identifying at least a portion of the data reador written by the code segment; determining whether dependencies existamong the blocks such that a first block depends on data assigned to asecond block using the read and write identifiers; generating a directedacyclic graph comprising nodes and arcs between the nodes by assigningthe blocks to the nodes and by assigning the dependencies to the arcs;displaying the directed acyclic graph; initiating execution of the codesegments; while the code segments are executing, determining which nodesin the graph are unexecuted nodes and which nodes in the graph areexecuted nodes; and displaying the unexecuted nodes in a manner visuallydistinctive from the executed nodes.
 20. A data processing systemcomprising: a memory comprising a data flow program and a data flowdevelopment tool that associates data processed by the data flow programto blocks in the memory, associates code segments of the data flowprogram to at least one of the blocks, determines dependencies betweenthe blocks, and displays a graph comprising nodes and arcs depicting thedependencies between the blocks; and a processor that runs the data flowdevelopment tool.
 21. The data processing system of claim 20, whereinthe nodes comprise executed nodes and unexecuted nodes, and wherein theexecuted nodes are displayed using an executed node visualization andthe unexecuted nodes are displayed using an unexecuted nodevisualization.
 22. The data processing system of claim 20, wherein thearcs comprise satisfied dependency arcs and unsatisfied dependency arcs,and wherein the satisfied dependency arcs are displayed using asatisfied dependency visualization and the unsatisfied dependency arcsare displayed using an unsatisfied dependency visualization.
 23. A dataprocessing system for developing a data flow program comprising codesegments that operate on data in memory, the data processing systemcomprising: means for apportioning a memory into regions and associatingthe data and the code segments with the regions; means for determiningdependencies between the regions; and means for displaying a graph ofnodes that are assigned regions, and arcs depicting the dependenciesbetween the regions.
 24. A computer readable memory device encoded witha data structure accessed by a data flow development tool run by aprocessor in a system, the data structure comprising: nodes assigned todata processed by a data flow program and to code segments of the dataflow program; and dependencies between nodes, wherein the developmenttool accesses the data structure to provide a visualization of the dataflow program.
 25. A computer readable memory device according to claim24, wherein the data structure further comprises: a processed flag thatindicates whether at least one of the nodes is executed or unexecuted.26. A computer readable memory device according to claim 24, wherein thedata structure further comprises: a taken flag that indicates whether atleast one of the nodes has been claimed by a thread.